GO Archive

The GO Archive is a comprehensive collection of the ontology and annotations from 2004. Note that this replaces the former GO CVS, SVN and old archive.

Access the GO Archives here

About the contents of the GO Archive

GO release folder hierarchy

  • annotations
    contains the GO annotations as GAF files. GPAD and GPI files are available from March, 2018 with the newer GO DOI releases
  • annotations/gp2protein (up to Feb, 2018)
    contains the files mapping gene products (usually MOD ids) to proteins (UniProtKB accession number)
  • annotations/gp2rna (up to Feb, 2018)
    equivalent of gp2protein files but for non-coding RNAs (mapping to RNA central IDs)
  • ontology
    contains the GO ontology (obo and owl files) - users are recommended to use ontology/go.obo (obo format 1.2) if they don’t need to go back further than March 2009 and ontology/gene_ontology.obo (obo format 1.0) if they need to go back to the beginning of the archive
  • ontology/extensions (from May, 2015)
    contains the various ontologies imported or produced by GO
  • ontology/external2go
    files mapping GO to different resources (e.g. interpro, kegg, reactome, etc)
  • ontology/subsets (from Oct, 2004)
    contains the GO slims used to simplify the ontology for specific purposes (e.g. goslim_synapse) or organisms (e.g. goslim_pombe) - we recommend to use .obo files (2004-now) rather than old deprecated .go files (2004-2009)
  • mysql_dumps (up to Jan, 2017)
    contains the MySQL dumps of GO (e.g. -assocdb , -termdb)
  • products/annotations
    contains the GO annotations provided by the MODs to the GO consortium. Those files are kept for transparency but users are recommended to use the GO annotations in annotations/ as they can differ due to various filtering and checks performed by the GO consortium to ensure quality

GO subsets in the archive

The GO subsets from 2004 to ~2018 were deposited to give an easy access to the GO slim used in a particular publication or analysis and for reuse by the GO community at the time. Some of these GO slims are no longer maintained by the authors and as such can contain obsoleted GO terms; slims that are still maintained are indicated in bold. Although we recommend to use the .obo files (consistent with our current releases), old and deprecated .go files were kept in the archive. In .go files, parentage and relationships are indicated by indentation and punctuation characters (e.g. ‘%’ to indicate an is_a relationship).

If you are looking for current, actively maintained GO slims, please see the guide to GO subsets

Topic / Usage Information
Generic GO slim Suparna Mundodi and Amelia Ireland Aug 2002
Aspergillus Subset for Aspergillus
Drosophila M. Adams, M. Ashburner, G.M. Rubin, S.E. Lewis et al.; Adams et al., PMID:10731132 Mar 2000
Glossina ESTs M. Berriman Sep 2002
Honey bee ESTs C.W. Whitfield, M.R. Band, M.F. Bonaldo, C.G. Kumar, L. Liu, J.R. Pardinas, H.M. Robertson, M.B. Soares, G.E. Robinson, PMID:11923340 Apr 2002
Mouse The RIKEN Genome Exploration Group Phase II Team and the FANTOM Consortium PMID:11217851 Feb 2001
P. falciparum M. Berriman July 2002
Plant Suparna Mundodi Dec 2002
Prokaryotic subset GO curators. Replaced by taxon constraints.
Rice (Beijing) J. Yu et al. PMID:11935017 Apr 2002
Rice (Syngenta) J. Yu et al. PMID:11935018 Apr 2002
UniProtKB-GOA N. Mulder, M. Pruess PMID:12230037 Nov 2002
Yeast SGD curators Aug 2003
Do not manually annotate The set of high level terms that are useful for grouping, but should have no direct annotations except from automated tools

Deprecated formats

Deprecated Ontology formats

GO currently provides the Gene Ontology in the OBO 1.2 format (as produced by the OWL API) and other formats; see the ontology download page for more information about current ontology file formats. Several file formats may exist in the archives that are no longer supported by GO:

  • Flat file format: deprecated in 2009.
  • OBO-XML and FASTA files: retired in 2018.
    OBO-XML was a direct XML serialization of the OBO 1.2 format specification. The schema is specified using RELAX-NG compact syntax: obo-xml.rnc.
  • RDF-XML and OWL (old mapping) formats: retired in early 2021.
    For users of the GO-RDF/XML version of the ontology, we recommend the OWL RDF/XML version. For OWL users, we continue to support the legacy obo2owl translation, but users are strongly encouraged to switch to the new translation.
  • OBO 1.0 file format: previous iteration of the OBO format, retired in 2018.

Deprecated Annotation formats

GO currently provides annotations in GAF 2.2 as well as GPAD/GPI 2.0. See the annotation download page for more information about current annotation file formats.

  • GPAD 1.0, 1.1 & 1.2: Deprecated as of 09-2024.
  • GPI 1.1 & 1.2: Deprecated as of 09-2024.
  • GO-CAMs were briefly available as SIF (Simple Interaction Format) files and support for these ended in 2024.
  • GAF 2.0: deprecated as of 03-2021.
  • GAF 1.0: deprecated as of 06-2010.

How the GO Archive was built

The archive was generated using the data scattered across 3 legacy systems, namely the GO CVS, the GO SVN and the old product archive. Each of those systems was created at different times to serve different purposes and they were partially redundant, both in terms of the types of data they contained and in time frames (e.g. SVN was maintained from 2011 to 2018 while CVS was maintained from 2002 to 2018). The project is hosted on GitHub.

“Modern” GO releases (March 2018-present)

  • March 2018 - January 2023
    In addition to the folder hierarchy described above, the GO DOI releases produced from March 2018 contain additional folders. These folders are only useful to a few people who would want or need to reproduce a GO release, using for instance the set of programs (bin/) and libraries (lib/) available at the time of the release. These were suspended from the March 2023 release.
  • Oct 2019 - present
    GO provides various statistics files in release_stats/.

Please contact the GO Helpdesk if you have any questions.